Goto

Collaborating Authors

 civil rights act


Revitalizing Saturated Benchmarks: A Weighted Metric Approach for Differentiating Large Language Model Performance

Etzine, Bryan, Hashemi, Masoud, Madhusudhan, Nishanth, Davasam, Sagar, Sharma, Roshnee, Madhusudhan, Sathwik Tejaswi, Yadav, Vikas

arXiv.org Artificial Intelligence

Existing benchmarks are becoming saturated and struggle to separate model performances due to factors like data contamination and advancing LLM capabilities. This paper introduces EMDM (Enhanced Model Differentiation Metric), a novel weighted metric that revitalizes benchmarks by enhancing model separation. EMDM integrates final answer and Chain-of-Thought (CoT) reasoning correctness, assigning weights based on the complexity and reasoning depth required to solve a given sample in the evaluation data. Using a baseline LLM in two setups-Unguided, where the model has no prior exposure to test samples, and Guided, where the model has prior knowledge of the desired answer-EMDM distinguishes instances of varying difficulty. The CoT and answer correctness from these setups inform an optimization objective for weight assignment, resulting in a more nuanced evaluation of model performance. Compared to the exact match (EM) metric, which achieves 17% separation on ARC-Challenge, EMDM achieves 46%, demonstrating its effectiveness in differentiating models based on reasoning and knowledge requirements.


AI chatbot 'hallucinations' perpetuate political falsehoods, biases that have rewritten American history

FOX News

Fox News correspondent Grady Trimble has the latest on fears the technology will spiral out of control on'Special Report.' Artificial intelligence query platforms offer in many cases a hallucinatory hard-left version of politics and history. The same biases and outright lies that reshaped academia over the last 50 years and infected the American body politic with division are endemic throughout versions of historical events perpetuated by OpenAI's generative platform ChatGPT, according to a number of searches done by Fox News Digital. "Artificial Intelligence will simply reflect and magnify the mindset and ideology of its creators -- and impress those values upon the rest of us," Victor Davis Hanson, senior fellow at the Hoover Institution, told Fox News Digital. "In other words, we are creating Silicon Valley-minded Frankensteins and unleashing them on the nation," he said.


Conversation on racism and robotics

Robohub

Talking about racism and it's impact on robotics and roboticists was the first conversation in our new biweekly online discussion series "Society, Robots and Us" on alternate Tuesdays at 6pm PDT. It was a generous, honest and painful discussion that I hope has left a lasting impact on everyone who listened. There is systemic racism in America, and this does have an impact on robotics and roboticists in many many ways. The US Senator Elizabeth Warren in conversation today with Alicia Garza from Black Futures Lab said, "America was founded on principles of liberty and freedom, but it was built on the backs of enslaved people. This is a truth we must not ignore. Racism and white supremacy have shaped every crucial aspect of our economy, and our political system for generations now."